\documentclass[12pt]{article}
\usepackage{layout,latexsym, array, enumerate, amsmath, amsthm,amssymb, amsfonts, natbib, subfigure, color, float}
\usepackage[mathscr]{eucal}
\usepackage{epsf,epsfig}
%\usepackage{epsf,epsfig,eufrak,dsfont}
%\bibliographystyle{apalike}

\definecolor{grey}{RGB}{190,190,190}


\usepackage[margin=0.8in]{geometry}

\newtheorem{theorem}{Theorem}[section]
\newtheorem{proposition}{Proposition}[section]
\newtheorem{corollary}{Corollary}[section]
\newtheorem{example}{Example}[section]
\newtheorem{lemma}{Lemma}[section]
\newtheorem{defn}{Definition}[section]
\newcommand{\lowtilde}[1]{\mathop{#1}\limits_{\textstyle\tilde{}}}
%\renewcommand{\baselinestretch}{1.4}

\newcommand{\off}{\mathcal{O}}
\newcommand{\poly}{{\cal P}(u-s;\mathbf{a})}
\newcommand{\coef}{\mathbf{a}}
\newcommand{\lik}{\ensuremath{\mathcal{L}}}
\newcommand{\map}{\ensuremath{\mathcal{M}}}
\newcommand{\area}[1]{\ensuremath{|\!|#1|\!|}}
\newcommand{\norm}[1]{\ensuremath{\left|\!\left|#1\right|\!\right|_1}}
\newcommand{\dee}[1]{\ensuremath{\,\mathrm{d}#1}}
\newcommand{\Steve}[1]{{\tt (#1)}}

\newcommand{\comment}[1]{}

\bibliographystyle{newapa}

\begin{document}
\title{Kernel Smoothing Spatially Aggregated Data with Time-Varying Boundaries}
\author{Paul Nguyen\footnote{University of Toronto and Cancer Care Ontario}, Patrick E. Brown\footnote{University of Toronto and Cancer Care Ontario},  Jamie Stafford\footnote{University of Toronto}, and Chun-Po Steve Fan\footnote{Bank of Montreal}}
\thispagestyle{empty}

\maketitle 

\textsc{Keywords:} local likelihood; local-\textsc{EM}; kernel smoothing; disease mapping; nonparametric.

\section{Introduction}

Disease mapping often requires working with spatially censored aggregated data, as data may be aggregated into regions to protect confidentiality or only collected at an area level such as health region or postal code. These regions often vary in size, being larger in rural areas. Canadian cities have six-digit postal codes that are able to locate an individual precisely, but rural and remote areas have postal codes that cover not only large geographical extents but larger populations than urban postal codes. Furthermore, estimation of disease risk must take the underlying population distribution into account, and spatially aggregated census data are often used for this purpose. Often, the regions the census data are aggregated to are different from the censoring regions for the case data.

Estimating disease risk for small areas or rare diseases requires using data collected over a long time period in order to accumulate sufficient cases to allow for accurate estimation. As postal codes and census regions change over time, this can cause problems due to the need to combine maps with different boundaries or tessellations.  Disease mapping on large, stable areas avoids these problems, though small area analysis would involve finer-scale census regions which are more volatile.  For instance, the Canadian census Dissemination Areas (formerly known as Enumeration Areas before 2001) contain roughly 400 individuals each and number in the tens of thousands.  The census is held every 5 years, and at each census, a substantial minority of the regions are changed, most commonly in fast growing suburban areas.

This paper implements a local-\textsc{EM} algorithm for spatial data with multiple types of aggregation, motivated by locally constant Poisson regression. The algorithm builds on the methods of \citet{steve}, where their implementation was shown to be related to the \textsc{EMS} algorithm introduced by \citet{silvermanems}. A local-\textsc{EM} algorithm is used to estimate the risk surface on a tessellation in which each of the cumulative intersections of the various maps are distinct regions. At each iteration, the expected number of cases in each of the regions of the fine tessellation are computed and the risk surface is re-estimated.

\subsection{Motivating Application}

This work is motivated by a desire to make inference upon the spatial distribution of mesothelioma lung cancer risk in the counties of Lambton and Middlesex located in Ontario, Canada.  Mesothelioma is a rare type of cancer, primarily caused by exposure to asbestos fibers \citep{murthy1999}, and the incidence rate in Lambton county is higher than in the rest of Ontario. Spatial data on mesothelioma incidence are available from 1985 to 2007, with 161 and 82 cases in Lambton County and Middlesex County, respectively, over this period.  Shortening the study period would reduce an already small dataset, and consequently, reduce statistical power. Hence, a methodology capable of accommodating the different geographical boundaries in the 5 censuses held over the period is needed.

The exact locations of the mesothelioma cases are not available and only six-digit postal codes are used. This is partly due to privacy concerns, with the release of full street addresses more tightly controlled than postal codes, and \textbf{partly because street addresses do not form part of the postal address for rural locations and as a result are often not collected. [needs more clarity]} Furthermore, in the earlier years postal codes were not collected for all the cases. The Ontario Cancer Registry collects case information from a variety of sources (e.g., death certificates, hospital records, lab results), and the year where postal codes were first reported varies from source to source. For all cases, however, coarser spatial information (Statistics Canada's Standard Geographical Classification) is available.

The census Dissemination Areas (DA's) are the finest level at which population counts are released. Boundaries for postal code regions are not defined, but rather a Postal Code Conversion File (PCCF) lists the DA or DA's where an address with a given postal code could be located in. In urban areas, Sarnia and London in this dataset, there is essentially a one-to-one correspondence between postal codes and DA's.  In rural areas, however, a single postal code can cover a large area of rural postal routes or a general delivery post office. Figure \ref{fig:pc}a shows the study region, denoting the urban and rural areas. Figures \ref{fig:pc}b-f show the census region boundaries for each of the census years in the study period, along with the regions covered by the two rural postal codes, N0N1G0 and N0N1R0. The city of Sarnia, in the upper left corner, contains roughly 120 DA's which are indistinguishable and solid black on the map. The EA boundaries for the 1986 census have never been digitized, so a combination of Census Tracts (CT's) and Census Subdivisions (CSD's) are used here. Notice that the regions covered by the postal codes are not always contiguous and they often overlay. 

\begin{figure}[htp]
\centering
\subfigure[Study region]{\includegraphics[width=0.31\textwidth]{mapRegionBW.png}}
\subfigure[1986]{\includegraphics[trim = 0mm 10mm 0mm 15mm, clip, width=0.31\textwidth]{pcRegion1986.pdf}}
\subfigure[1991]{\includegraphics[trim = 0mm 10mm 0mm 15mm, clip, width=0.31\textwidth]{pcRegion1991.pdf}}
\subfigure[1996]{\includegraphics[trim = 0mm 10mm 0mm 15mm, clip, width=0.31\textwidth]{pcRegion1996.pdf}}
\subfigure[2001]{\includegraphics[trim = 0mm 10mm 0mm 15mm, clip, width=0.31\textwidth]{pcRegion2001.pdf}}
\subfigure[2006]{\includegraphics[trim = 0mm 10mm 0mm 15mm, clip, width=0.31\textwidth]{pcRegion2006.pdf}}
\caption{Study region and boundaries of census regions: Census Tracts and Census Subdivisions (1986); Enumeration Areas (1991, 1996); and Dissemination Areas (2001, 2006).}
\label{fig:pc}
\end{figure}

In order to make inference on the spatial distribution of mesothelioma based on the postal code data, this paper builds on the local likelihood or kernel smoothing methodology of \citet{steve} in several important ways. A one-to-one correspondence between census regions and postal code regions is not assumed. Bandwidth selection and quantification of uncertainty are accomplished through cross validation and bootstrapping. Finally, the regional boundaries are not forced to lie on a grid. 

This kernel smoothing problem is cast in the local likelihood and local-\textsc{EM} framework to provide a model-based justification for the treatment of spatio-temporal variation in the population size and composition, as well as a temporal trend. As with \citet{steve}, the local-\textsc{EM} algorithm is shown to be equivalent to the \textsc{EMS} algorithm of \citet{silvermanems} and this latter algorithm is used on the data. The high dimensionality of the problem results in the algorithm being both memory and computationally intensive, though unlike Markov Chain Monte Carlo (MCMC) approximations, it can be parallelized to a large degree and assessing convergence is trivial. 

\section{Methods}

The development of local likelihood and kernel methods for spatial and spatially aggregated data may be found in a sequence of papers attributed to \citet{brillinger1990st, brillinger1991si, brillinger1994esp} and the methods of \citet{silvermanems}. While Brillinger was concerned with spatially smoothing data that had been aggregated into the regions of a single map and \citet{silvermanems} focused on the reconstruction of a single image, our methods permit multiple maps with differing tessellations to be combined. It should also be noted that with stable boundaries the \citet*{bym} model with Bayesian inference is widely used \citep[see also][]{lawson}.

\subsection{The Model}

Let $(S_{jk},T_{jk})$ be the location in space and time for case $k$ in the $j$th age and sex group, and $P_j(s,t)$ and $\Lambda_j(s,t)$ be the population and risk surface, respectively, for location $s$ and time $t$. The locations and times are realisations from a spatio-temporal inhomogeneous Poisson process with
\begin{align*}
\{S_{jk},  T_{jk} ;k=1...N_j\} &\sim \text{Poisson Process}[\Lambda_j(s,t) P_j(s,t)]\\
\Lambda_j(s,t) &= \Gamma(t)\Theta_j \lambda(s)\\
\end{align*}
Here, $\Gamma(t)$ represents variations in risk over time, reflecting at least in part improvements in detection and data collection, and $\Theta_j$ is the risk for age and sex group $j$.  The $\lambda(s)$ surface, constant over time and over age-sex groups, is the spatial relative risk surface.  A Log Gaussian Cox Process would result from modelling $\lambda(s)$ as an exponentiated Gaussian random field, an approach taken by \citet{lupus}.  Here, we derive a nonparametric estimator of $\lambda(s)$.

Estimates for $\Gamma(t)$ and $\Theta_j$ are derived from an external dataset and treated as known {\em a priori}. Since Lambton County has unusual high incidence counts than the rest of Ontario, its cases and population were removed from the dataset and Ontario census, respectively, to provide a better estimate of $\Gamma(t)$. In latter years, these cases for each age and sex group $j$ should be relatively proportional with the rest of Ontario, so using the complete dataset and census should not affect the estimate of $\Theta_j$. The $\hat\Theta_j$ are the total number of cases in group $j$ in the province of Ontario during 2007 divided by the population for that group from the 2006 census. The time trend is estimated over census periods $i=1,\ldots,5$. For period $C_1 = [1985,1988)$, for instance,  we define $\Gamma_1=\int_{1985}^{1988} \Gamma(t) dt$ and estimate $\hat\Gamma_i$ from the provincial data.  Writing $\bar P_{ij}$ as the total population of the $j$th group in Ontario at the $i$th census, then $\hat\Gamma_i$ is the total number of cases during the $i$th census period divided by the expected number $E_i = |C_i|\sum_j \hat\Theta_j P_{ij}$.

Given the $\hat\Theta_j$ and $\hat\Gamma_i$, sufficient statistics for the likelihood of $\lambda(\cdot)$ are the locations $Y_{ik}$ of the $k$th case during census period $i$. In other words, the event times and the group each case belongs to are not needed. For a fixed $i$, the cases are independent inhomogeneous Poisson processes with
\begin{align*}
\{Y_{ik}; k = 1 \ldots K_i\} \sim& \text{Poisson Process}\left[\off_i(s)\lambda(s)\right]\\
\off_i(s) = & |C_i| \hat\Gamma_i\sum_j \hat\Theta_j P_{ij}(s).
\end{align*}
The population $P_{ij}(s)$ of the $j$th group at the $i$th census is assumed to be evenly distributed within census regions. If $\bar P_{ijr}$ is the published count of inhabitants of group $j$ during census $i$ in region $D_{ir}$, then $P_{ij}(s) = \bar P_{ijr} / |D_{ir}|$ for $s \in D_{ir}$. The following section describes a nonparametric estimator for $\lambda(\cdot)$.

\subsection{Local-\textsc{EM} Estimation} 

For a given risk surface $\lambda(\cdot)$, the likelihood of a given set of observations $y_{ik}$ is \citep[see][]{illian2008statistical}
\begin{equation}\label{eq:ippLikelihood}
L(\lambda;\{y_{ik}\}) =  \prod_{i=1}^M \left[\prod_{k=1}^{K_i} \off_i(y_{ik})\lambda(y_{ik})\right] \exp\left[ -  \int_{\map_i} \off_i(u) \lambda(u) \dee u\right].
\end{equation}
Here, $\map_i$ is the extent of the study region at census $i$.  Maximizing (\ref{eq:ippLikelihood}) with respect to $\lambda(\cdot)$ would yield a zero-valued surface with infinite spikes where cases are observed. Local likelihood computes a nonparametric estimate of $\lambda(s)$ by giving more weight to the data close to $s$ when estimating at $s$.  

In the paper, the kernel function consists of the bivariate standard Normal density 
\[
K_h (x) = \frac{1}{h^2} f\left(\frac{x_1}{h} \right) f \left(\frac{x_2}{h} \right)
%\propto \exp\left[-\left(\frac{||x - s||}{h}\right)^2\right]\cdot
\]
where $f(\cdot)$ is the density of a standard Normal distribution. Although other kernel functions can be applied here, the optimal bandwidth size is the most important factor. Applying a kernel $K_h(u)$ with bandwidth $h$ to the log of the likelihood in (\ref{eq:ippLikelihood}) and bringing the kernel inside the sums and integral, yields
\begin{equation}\label{eq:ippLocalLikelihood}
\ell(\lambda;s) = \sum_{i=1}^M \left[\sum_{j=1}^{J_i}K_h(y_{ik}-s) \log(\lambda(y_{ik})) \right]  -  \int_{\map_i} K_h(u-s) \off_i(u) \lambda(u) \dee u + C.
\end{equation}
\noindent where the constant $C$ depends on the $\log(\off_i(Y_{ik}))$.  

Local likelihood enforces smoothness by specifying a parametric formula for $\lambda(\cdot)$ (i.e., the exponentiated polynomials of \citet{loader1999lra}), and these parameters are re-estimated for multiple locations $s$.  The simplest parametric model is the locally constant model, with $\lambda(s) = \exp(\lambda)$, which will be used here.  It would in theory be possible to use local linear or local polynomial models for $\lambda(s)$ \citep[see][]{steve}, though the locally constant model proved to be near the limits of current computational capabilities.

Here, the locations $Y_{ik}$ are unobserved, and the postal codes define a region $R_{ik}$ which the case must belong to. If the $R_{ik}$ are non-intersecting for all $i$ and $k$,  estimation of $\lambda(s)$ is well studied \citep[see][]{brillinger1990st}. Because the postal codes are overlapping, and the census regions where the offsets are defined are changing, a local-\textsc{EM} algorithm is used to maximize the expected likelihood in place of (\ref{eq:ippLocalLikelihood}). As the expectation depends on the intensity $\lambda$, the procedure is iterative.  

At iteration $r$, the intensity  $\lambda^{(r-1)}(\cdot)$ is used to take the expectation of $\ell(\lambda;s)$ conditioning on $R_{ik}$. By moving the expectation inside the sums and noting that the contributions from each $y_{ij}$ in a given $R_{ij}$ are identical, the local-\textsc{EM} recursion at iteration $r$ becomes
\begin{equation}
\label{eq:formalEM}
\ell^{(r)}(\lambda;s, R_{ik}) = \sum_{i=1}^M \sum_{k=1}^{K_i}  \text{E} \left[  K_h(Y_{ik}-s) \lambda  \Bigg |Y_{ik} \in R_{ik}\ ;\ \hat\lambda^{(r-1)}(\cdot)  \right] 
 -  \int_{\map_i} K_h(u-s) \off_i(u) \lambda \dee u.
\end{equation}

As shown in \cite{steve} and in the Apendix, we write $\bf\Lambda$ as the vector of $\Lambda_\ell$'s where
\begin{equation}\label{EMSic}
\hat{\bf \Lambda}^{(r+1)}={\mathfrak M}(\hat{\bf \Lambda}^{(r)}) {\cal K}.
\end{equation}
Here, ${\cal K}$ is an $L\times L$ smoothing matrix with entries
\begin{equation}\label{smoothstep}
{\cal K}_{\ell m}= \frac{1}{\area{Q_{\ell}}}\int_{Q_m}\frac{\int_{Q_\ell}K_h(u-s)\dee{u}}{\Psi(s)}\dee{s}
\end{equation}
and ${\mathfrak M}(\hat{\bf \Lambda}^{(p)})$ is a $L$ dimensional row vector whose $\ell$th entry is
\begin{equation}\label{EMstep}
[{\mathfrak M}(\hat{\bf \Lambda}^{(r)})]_\ell=\sum_{ik} y_{ik} \frac{\tilde\off_{i\ell} \hat{\Lambda}^{(r)}_{\ell} {\cal I}_{ik\ell}}{\sum_m \tilde\off_{im} \hat{\Lambda}^{(r)}_{m} {\cal I}_{ikm}}\cdot
\end{equation}
The above derivation is similar to that appearing in \citet{steve}, with the difference that here there is a distinction made between census regions $D_{ir}$ and postal regions $R_{ik}$.  Further, this iteration is identical to \cite{silvermanems}'s \textsc{EMS} algorithm, originally proposed  as an {\it ad hoc} method for improving the behaviour of the \textsc{EM} algorithm by including a smoothing step. Here, they are seen to arise formally from local likelihood considerations when data are area censored.

At the last iteration, the estimate of $\lambda$ is
\begin{equation}\label{EMSfinal}
\left\{ \sum_{ik\ell} y_{ik} \frac{\tilde\off_{i\ell} \hat{\Lambda}^{(r)}_{\ell} {\cal I}_{ik\ell}}{\sum_m \tilde\off_{im} \hat{\Lambda}^{(r)}_{m} {\cal I}_{ikm}}  \frac{\int_{Q_\ell}K_h(u-s)\dee{u}}{\area{Q_{\ell}}\Psi(s)} \right\} \cdot
\end{equation}

\subsection{Bandwidth Selection and Uncertainty Quantification}

In this paper, finding the optimal bandwidth size in the study area is a difficult problem because the cases were significantly greater in urban areas and the postal code regions were significantly larger in rural areas. As a result, variable bandwidth sizes are used, with smaller bandwidth sizes required for urban areas and larger bandwidth sizes required for rural areas. Since the generalized linear model intensity estimates (i.e., intensity estimates using infinite bandwidth size) of the risk surface for Lambton County and Middlesex County are 5.5178 and 1.0956, respectively, 4 optimal bandwidth sizes are computed for the urban and rural counterparts of Lambton County and Middlesex County. For regions near the boundaries of the urban and rural areas, the optimal bandwidth sizes are linear interpolated between the urban and rural optimal bandwidth sizes. 

The optimal bandwidth size is selected using the leave-one-map-out cross validation for both urban and rural areas. Since excluding a map alters the offsets, the cross validation requires the re-calculation of the smoothing matrix each time a map is left out. The construction of the smoothing matrices is limited to a few equal-distant bandwidth values. Since the data are Poisson distributed, the prediction error is estimated as 
\[
CV(h) = -\frac{1}{M}\sum\limits_{ik} \log g(y_{ik}; \rho_{ik}) 
\]
where $g(\cdot)$ is the density of the $\mbox{Poisson}(\rho_{ik})$ distribution, $\rho_{ik} = \off_i(s) \lambda^{-i} ||R_{ik}||$ and $\lambda^{-i}$ is the intensity computed after excluding the $i$th map. The optimal bandwidth size is the one that gives the smallest prediction error.

After the risk surface is estimated with the optimal bandwidth sizes, the uncertainty of this surface is quantified using parametric bootstrap. This detects real and artificial areas of high risk. The bootstrapped data are simulated under the assumption that the cases are a realisation from the background population and a constant risk surface over the study area of each map. For each map, cases are first simulated using a Poisson distribution to the census regions, and then, randomly assigned to postal code regions with weights inversely proportion to the number of census regions covered by the postal code regions. Afterwards, the data for each map are aggregated. 

Using the same smoothing matrix computed for the observed data, the local-\textsc{EM} algorithm is implemented on these aggregated data to estimate the bootstrapped risk surface. The exceedance probabilities are computed as the proportion of bootstrapped risk surface estimates less than the observed risk surface estimate (i.e., $\Pr(\hat{\lambda}_{boot} < \hat{\lambda}_{obs})$). Afterwards, these probabilities are plotted in the study area as the estimated risk surface. If the exceedance probabilities exceed 80\%, the locations are considered to be high risk under the assumed risk surface. As this risk surface increases, this provides more evidence that certain locations are of high risk. 

In this paper, the exceedance probabilities are computed using 200 simulated datasets to provide a \textbf{precision[better term for this]} of 0.005. More simulations can be done, but the overall results are not likely to change.

\section{Computation Considerations}

In this paper, most of the computations took place in evaluating the entries of the smoothing matrix (\ref{smoothstep}). The entries are evaluated over two double integrals, which is computationally intensive. It becomes more apparent with a larger number of partitions or optimal bandwidth size. Because a two-dimensional Gaussian kernel function is used in this paper, the inner double integral can be easily and precisely evaluated, but the outer double integral still has to be numerically approximated. Computations are done in the statistical software R \citep{r2011}. 

If the grid partitions are used, the outer double integral are easily approximated using the Gaussian Quadrature Rule. However, this requires a large fine grid such that the grid points overlay all census regions of each map in the study area. For example, if a $50 \times 50$m grid is overlayed onto the study area, approximately 2.5 million grid partitions would be created. Although the entries of this smoothing matrix would contain mostly zeroes, it is beyond any computational limitations. 

This motivates the overlaying of maps onto each other to construct polygon partitions for the study area. Using ArcMap 10, 2,826 irregular shaped and sized partitions are generated from the 5 maps for Lambton County and Middlesex County after dissolving several sliver and very small partitions into neighbouring partitions. To evaluate the entries of the smoothing matrix, several hundred equal-distant grid points are first overlayed on each partition polygon. Once the inner double integral is evaluated, the outer double integral is approximated using the Riemann Sum Rule. Although this approach is time consuming, it is computational feasible and the smoothing matrix has $2,826 \times 2,826$ entries for each bandwidth size.

One advantage of the local-\textsc{EM} algorithm over MCMC approximations in R is ability to use parallel computing. Parallel computing allows many calculations to be carried out simultaneously, based on the idea that a large task can be divided into smaller ones. With multicore processors being available for a long time, parallel programming is the feasible approach in analyzing very large datasets when using non-standard models. Utilizing more processors will yield faster results. The ``Rmpi'' package \citep{yu2001} allows the ability to parallel program in R. With parallel computing, each column of the smoothing matrix is evaluated simultaneously on a separate processor, and then, combined to generate the smoothing matrix. 

%It can also be applied in estimating the risk surface (\ref{EMSfinal}), where the points can also be evaluated simultaneously, and then, combined to generate an estimate of the risk surface. 

Computations are performed on the General Purpose Cluster (GPC) supercomputer at the SciNet HPC Consortium \citep{scinet}. SciNet is funded by: the Canada Foundation for Innovation under the auspices of Compute Canada; the Government of Ontario; Ontario Research Fund - Research Excellence; and the University of Toronto. In this paper, 64 Intel Xeon E5540@2.53GHz quad core processors are used, allowing the computations to be done approximately 256 times faster than on a single processor.

\section{Simulation Study}

A simulation study is conducted to assess the performance of the local-\textsc{EM} algorithm under these scenarios presented in this paper. Each sample is simulated using the maps and data of Lambton County. Overlaying the 5 maps of Lambton County generated 870 partitions of varying shapes and sizes. Variable bandwidth sizes are used to construct the smoothing matrix by finding the optimal bandwidth sizes of the urban and rural areas. For the urban areas, 40 equal-distant bandwidth sizes between 50m and 2km are used, while for the rural areas, 39 equal-distant bandwidth sizes between 1km and 20km are used.

Risk surfaces are simulated using a Whittle Matern Gaussian random field with mean of 1.7, based on the log of the generalized linear model intensity estimate, and variance of 3.7, based on the variance of the log intensity estimates computed from the optimal bandwidth sizes. The range parameter, which controls the smoothness of the risk surface, is set to be 5km, 10km and 20km. For each range parameter, 20 samples are simulated. Cases are first simulated at a pixelated level using a Poisson distribution. These cases are aggregated to the census regions, and then, randomly assigned to postcode regions with probabilities inversely proportion to the number of census regions covered by the postcode regions. Afterwards, the local-\textsc{EM} algorithm is implemented to estimate the risk surface of the simulated dataset. 

The Receiver Operating Characteristic (ROC) curve is used to evaluate the algorithm's ability in identifying the true areas of high risk. The curve is calculated at the pixelated level, with the sensitivity corresponding to the probability that a given pixel having excess risk is correctly identified and specificity corresponding to the probability that a given pixel having low risk is correctly identified. For each range parameter, sensitivity and specificity are computed for exceedance risks of 1, 2, $\mu$, $1.5\mu$ and $2\mu$, where $\mu = 5.5178$, at 201 equal-distant thresholds between 0 and 1, and then, averaged over the 20 samples. Figure \ref{fig:roc} shows the ROC curves of the simulation results. The local-\textsc{EM} algorithm performs well in detecting true areas of high risk for the specified exceedance risk, particularly with larger range parameters (i.e., smoother risk surfaces).

\begin{figure}[htp]
\centering
\subfigure[Exceedance risk of 1]{\includegraphics[width=0.31\textwidth]{rocLambton_risk1.pdf}}
\subfigure[Exceedance risk of 2]{\includegraphics[width=0.31\textwidth]{rocLambton_risk2.pdf}}
\subfigure[Exceedance risk of $\mu$]{\includegraphics[width=0.31\textwidth]{rocLambton_risk5.pdf}}
\subfigure[Exceedance risk of $1.5\mu$]{\includegraphics[width=0.31\textwidth]{rocLambton_risk8.pdf}}
\subfigure[Exceedance risk of $2\mu$]{\includegraphics[width=0.31\textwidth]{rocLambton_risk11.pdf}}
\caption{ROC curves for range parameters of 5km, 10km and 20km.}
\label{fig:roc}
\end{figure}

\section{Mesothelioma Study}

For the urban areas in Lambton County, the cross-validation values are computed using 40 equal-distant bandwidth values between 50m and 2km, while for the urban areas in Middlesex County, they are computed using 12 equal-distant bandwidth values between 500m and 6km. For the rural areas in Lambton County and Middlesex County, 39 equal-distant bandwidth values between 1km and 20km are used. The resulting cross-validation values for each area are plotted in Figure \ref{fig:cv}. According to these plots, the optimal bandwidth sizes for the urban areas of Lambton County and Middlesex County are 450m and 3.5km, respectively. The optimal bandwidth sizes for the rural areas of Lambton County and Middlesex County are 5km and 6km, respectively; since the cross-validation values are quite flat near their respective optimal bandwidth sizes, the bandwidth size of 6km is used for both rural areas. 

\begin{figure}[htp]
\centering
\subfigure[Urban areas for Lambton County]{\includegraphics[trim = 0mm 0mm 0mm 15mm, clip, width=0.45\textwidth]{cvInsideSarnia.pdf}}
\subfigure[Rural areas for Lambton County]{\includegraphics[trim = 0mm 0mm 0mm 15mm, clip, width=0.45\textwidth]{cvOutsideSarnia.pdf}}
\subfigure[Urban areas for Middlesex County]{\includegraphics[trim = 0mm 0mm 0mm 15mm, clip, width=0.45\textwidth]{cvInsideLondon.pdf}}
\subfigure[Rural areas for Middlesex County]{\includegraphics[trim = 0mm 0mm 0mm 15mm, clip, width=0.45\textwidth]{cvOutsideLondon.pdf}}
\caption{Cross-validation values as a function of bandwidth size.}
\label{fig:cv}
\end{figure}

Figure \ref{fig:risk} shows the estimated risk surface and exceedance probabilities for the risk of 1, 2, 5, $\mu$, $1.5\mu$ and $2\mu$ using the local-\textsc{EM} algorithm with the optimal bandwidth sizes computed for the study area. As evidently shown in the plots, Sarnia and areas in its vinicity have substantially elevated risk than the rest of the study area. The areas in vinicity of Sarnia are of great concern, having at least double the risk of the generalized linear model intensity estimate of Lambton County. 

\begin{figure}[htp]
\centering
\subfigure[Estimated risk surface]{\includegraphics[trim = 20mm 20mm 10mm 20mm, clip, width=0.45\textwidth]{lem_riskSurfaceLM.png}}
\subfigure[Risk surface of 1]{\includegraphics[trim = 20mm 20mm 10mm 20mm, clip, width=0.45\textwidth]{boot_riskSurfaceLM_risk1.png}}
\subfigure[Risk surface of 2]{\includegraphics[trim = 20mm 20mm 10mm 20mm, clip, width=0.45\textwidth]{boot_riskSurfaceLM_risk2.png}}
\subfigure[Risk surface of $\mu$]{\includegraphics[trim = 20mm 20mm 10mm 20mm, clip, width=0.45\textwidth]{boot_riskSurfaceLM_risk5.png}}
\subfigure[Risk surface of $1.5\mu$]{\includegraphics[trim = 20mm 20mm 10mm 20mm, clip, width=0.45\textwidth]{boot_riskSurfaceLM_risk8.png}}
\subfigure[Risk surface of $2\mu$]{\includegraphics[trim = 20mm 20mm 10mm 20mm, clip, width=0.45\textwidth]{boot_riskSurfaceLM_risk11.png}}
\caption{Esimated risk surface and exceedance probabilities for mesothelioma risk in Lambton County and Middlesex County.}
\label{fig:risk}
\label{fig:boot}
\end{figure}

\section{Discussion}

In this paper, the local-\textsc{EM} algorithm, proposed by \citet{steve}, is improved upon to estimate intensity risk in larger study areas, such as a county or province, where one-to-one correspondence between census regions and postal code regions is not assumed. As well, the computations are also improved with the use of polygon partitions and parallel computing. Simulation results indicate that the algorithm performs well, particularly with smoother surfaces; however, no methods currently exist to compare with these results. 

Lambton County is known to have a higher number of mesothelioma cases than the rest of Ontario, probably due to the numerous industrial factories located in the county. The local-\textsc{EM} algorithm shows the areas of elevated risk, which is in Sarnia and its vicinity. These are the areas that need to be focused on. We tried to analyze the risk over time in Lambton County by dividing the postal code regions within zones appropriate for risk surface. We had little success with proper allocation of the regions and cases within the zones for each map, so these analyses are not included. A spatio-temporal model that can handle spatio-temporal interactions is required.



\bibliography{llems}

\appendix

\section{EMS Derivation}
The maximum of (\ref{eq:formalEM}), achieved by differentiating (\ref{eq:formalEM}) with respect to $\lambda$ and setting to zero, is
\newcommand{\thedenom}{\sum_{i}\int_{{\cal M}_i}\off_i(u) K_h(u-s) \dee u}
\begin{equation}\label{eq:efficientEMS}
\hat{\lambda}^{(r)}(s)=\sum_{ik} \text{E}[K_h(Y_{ik}-s) | Y_{ik} \in R_{ij};\hat\lambda^{(r-1)}(\cdot)]\left/
\thedenom
%\Psi(\hat{\coef}^{(r)}_{-1}(s))
\right. .
\end{equation}
The updated intensity surface, $\hat{\lambda}^{(r)}(s)$, is obtained by solving (\ref{eq:efficientEMS}) for many different values of $s$. The algorithm differs from a typical \textsc{EM} algorithm because, at the \textsc{E}-step, expectation is computed with respect to an estimate of the infinite dimensional parameter $\lambda(\cdot)$ while, at the \textsc{M}-step, we only estimate this parameter locally at $s$. 

It is not possible to evaluate $\hat{\lambda}^{(r)}(s)$ for the infinite number of possible $s$, and the risk surface $\lambda(\cdot)$ is therefore discretized as a piecewise constant function $\bar\lambda(\cdot)$ over disjoint regions ${\cal Q}=\{Q_\ell; \ell=1 \ldots,L\}$. The $Q_\ell$ must have the property that they cover the entire study region $\map=\cup_{i=1}^M \map_i$ and  $Q_\ell$ does not intersect with any of the census regions $D_{ir}$ or postal code regions $R_{ik}$ (though there would necessarily be a number of common boundaries).  The smallest possible ${\cal Q}$ is obtained by overlaying the boundaries of the five census maps. This is the approach used here, and we refer to ${\cal Q}$ as the {\em partition} of the census maps $\{\map_i;i=1\ldots M\}$. 

The discretized risk surface is then $\bar\lambda(s) = \bar\lambda_\ell; s \in Q_\ell$. Here, $\bar\lambda$ may be formally referred to as the $\mathcal{Q}$-approximant of $\lambda$ (see \cite{royden1988ra} for details). As the populations are constant within census regions $D_{ir}$, the offsets $\off_i(s)$ are constant within partitions $Q_\ell$ and we write $\off_i(s) = \tilde\off_{i\ell}; s \in Q_\ell$. The estimates of $\bar\lambda_\ell$ and offsets $\tilde\off_{i\ell}$ are used in place of $\hat\lambda(\cdot)$ and $\off_i(\cdot)$ when calculating the conditional expectations in (\ref{eq:formalEM}), with the effect that a finite number of quantities $\bar\lambda_\ell$ need to be estimated rather than the infinite-dimensional $\lambda(\cdot)$.

Substituting $\Lambda_\ell=||Q_\ell||\bar\lambda_\ell$ in place of $\lambda_\ell$ and writing ${\cal I}_{ij\ell}$ as an indicator function for $R_{ik}$ intersecting with $Q_\ell$, the conditional density of an event $Y$ within region $R_{ik}$ is 
\[
\Pr(Y=y | Y \in R_{ik};\bar\lambda) = \tilde\off_{i\ell} \Lambda_\ell {\cal I}_{ijl} \left/ ||Q_\ell|| \sum_{m=1}^L \tilde\off_{im} \Lambda_m {\cal I}_{ikm}\ ; y \in Q_\ell \right. .
\]
\noindent The conditional expectation from (\ref{eq:efficientEMS}), using $\bar\lambda$ in place of $\lambda$ is then
\begin{eqnarray}
\text{E} \left[K_h(Y-s)  | Y \in R_{ik} ; \bar\lambda \right] &=& \int_{R_{ik}} K_h(y-s) \Pr(y|Y\in R_{ik};\bar\lambda) \dee y \nonumber \\ 
&=& \sum_\ell \frac{\tilde\off_{i\ell} {\Lambda}_{\ell} {\cal I}_{ij\ell} }{\sum_m \tilde\off_{im} \Lambda_{m} {\cal I}_{ikm}} \int_{Q_\ell} K_h(y-s)  / ||Q_\ell||  \dee  y \nonumber \\
&=&\sum_{\ell} \frac{\tilde\off_{i\ell} {\Lambda}_{\ell} {\cal I}_{ij\ell} \int_{Q_\ell} K_h(y-s) \dee{y}}{\area{Q_{\ell}}\sum_m \tilde\off_{im} {\Lambda}_{m} {\cal I}_{ikm}}\cdot \label{Cev}
\end{eqnarray}
Substituting (\ref{Cev}) into expression (\ref{eq:efficientEMS}) leads to the iteration
\begin{eqnarray*}
\hat{\Lambda}^{(r+1)}_\ell&=&\int_{Q_\ell} \hat{\lambda}^{(r+1)}(s)\, \dee s\\
%
&=&\int_{Q_\ell} \left\{ \sum_{ikm} y_{ik} \frac{\tilde\off_{im} \hat{\Lambda}^{(r)}_{m}{\cal I}_{ikm}}{\area{Q_m} \sum_n \tilde\off_{in} \hat{\Lambda}^{(r)}_{n} {\cal I}_{ijn}} \frac{\int_{Q_m} K_h({x-s})\dee{x}}{\Psi(s)}\right \} \dee{s}\\
%
&=&\sum_{ikm} y_{ik} \frac{\tilde\off_{im} \hat{\Lambda}^{(r)}_{m} {\cal I}_{ikm}}{\area{Q_m}\sum_n \tilde\off_{in} \hat{\Lambda}^{(r)}_{n} {\cal I}_{ijn}} \int_{Q_\ell} \frac{\int_{Q_m}K_h(y-s)\dee{y}}{\Psi(s)}\dee{s}.
\end{eqnarray*}
where
\[
\Psi(s) = \thedenom= \sum_{im} \tilde\off_{im} \int_{Q_m} K_h(u-s) \dee u.
\]

\end{document}
